Yantai
The Download: Making AI Work, and why the Moltbook hype is similar to Pokémon
Are you interested in learning more about the ways in which AI is being used? We've launched a new weekly newsletter series exploring just that: digging into how generative AI is being used and deployed across sectors and what professionals need to know to apply it in their everyday work. Each edition of Making AI Work begins with a case study, examining a specific use case of AI in a given industry. Then we'll take a deeper look at the AI tool being used, with more context about how other companies or sectors are employing that same tool or system. Finally, we'll end with action-oriented tips to help you apply the tool. The first edition takes a look at how AI is changing health care, digging into the future of medical note-taking by learning about the Microsoft Copilot tool used by doctors at Vanderbilt University Medical Center.
- North America > United States > New York (0.05)
- North America > United States > Massachusetts (0.05)
- Europe > Iceland (0.05)
- (2 more...)
- Health & Medicine > Health Care Providers & Services (0.69)
- Leisure & Entertainment > Games > Computer Games (0.42)
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Zhang, Xiaoyang, Li, jinjiang, Fan, Guodong, Ju, Yakun, Fan, Linwei, Liu, Jun, Kot, Alex C.
Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (2 more...)
Survey of Vision-Language-Action Models for Embodied Manipulation
Li, Haoran, Chen, Yuhui, Cui, Wenbo, Liu, Weiheng, Liu, Kai, Zhou, Mingcai, Zhang, Zhengtao, Zhao, Dongbin
Embodied intelligence systems, which enhance agent capabilities through continuous environment interactions, have garnered significant attention from both academia and industry. Vision-Language-Action models, inspired by advancements in large foundation models, serve as universal robotic control frameworks that substantially improve agent-environment interaction capabilities in embodied intelligence systems. This expansion has broadened application scenarios for embodied AI robots. This survey comprehensively reviews VLA models for embodied manipulation. Firstly, it chronicles the developmental trajectory of VLA architectures. Subsequently, we conduct a detailed analysis of current research across 5 critical dimensions: VLA model structures, training datasets, pre-training methods, post-training methods, and model evaluation. Finally, we synthesize key challenges in VLA development and real-world deployment, while outlining promising future research directions.
Potential Indicator for Continuous Emotion Arousal by Dynamic Neural Synchrony
Pan, Guandong, Wu, Zhaobang, Yang, Yaqian, Wang, Xin, Liu, Longzhao, Zheng, Zhiming, Tang, Shaoting
The need for automatic and high-quality emotion annotation is paramount in applications such as continuous emotion rec ognition and video highlight detection, yet achieving this through manu al human annotations is challenging. Inspired by inter-subject corre lation (ISC) utilized in neuroscience, this study introduces a novel Electr oencephalog-raphy (EEG) based ISC methodology that leverages a single-e lectrode and feature-based dynamic approach. Our contributions are three folds: Firstly, we reidentify two potent emotion features suitabl e for classifying emotions--first-order difference (FD) an differential entrop y (DE). Secondly, through the use of overall correlation analysis, we d emonstrate the heterogeneous synchronized performance of electrodes. Th is performance aligns with neural emotion patterns established in prior st udies, thus validating the effectiveness of our approach. Thirdly, by emplo ying a sliding window correlation technique, we showcase the significant c onsistency of dynamic ISCs across various features or key electrodes in ea ch analyzed film clip. Our findings indicate the method's reliability in c apturing consistent, dynamic shared neural synchrony among individual s, triggered by evocative film stimuli. This underscores the potential of our approach to serve as an indicator of continuous human emotion arousal . The implications of this research are significant for advancement s in affective computing and the broader neuroscience field, suggesting a s treamlined and effective tool for emotion analysis in real-world applic ations. 2 G. Pan et al.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
Contrastive Analysis of Constituent Order Preferences Within Adverbial Roles in English and Chinese News: A Large-Language-Model-Driven Approach
Based on comparable English-Chinese news corpora annotated by Large Language Model (LLM), this paper attempts to explore the differences in constituent order of English-Chinese news from the perspective of functional chunks with adverbial roles, and analyze their typical positional preferences and distribution patterns. It is found that: (1) English news prefers linear narrative of core information first, and functional chunks are mostly post-positioned, while Chinese news prefers overall presentation mode of background first, and functional chunks are often pre-positioned; (2) In SVO structure, both English and Chinese news show differences in the distribution of functional chunks, but the tendency of Chinese pre-positioning is more significant, while that of English post-positioning is relatively mild; (3) When function blocks are co-occurring, both English and Chinese news show high flexibility, and the order adjustment is driven by information and pragmatic purposes. The study reveals that word order has both systematic preference and dynamic adaptability, providing new empirical support for contrastive study of English-Chinese information structure.
- Asia > China > Beijing > Beijing (0.41)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (8 more...)
- Government (0.93)
- Media > News (0.67)
A Survey of Context Engineering for Large Language Models
Mei, Lingrui, Yao, Jiayu, Ge, Yuyao, Wang, Yiwei, Bi, Baolong, Cai, Yujun, Liu, Jiazhi, Li, Mingyu, Li, Zhong-Zhi, Zhang, Duzhen, Zhou, Chenlin, Mao, Jiayi, Xia, Tianze, Guo, Jiafeng, Liu, Shenghua
The performance of Large Language Models (LLMs) is fundamentally determined by the contextual information provided during inference. This survey introduces Context Engineering, a formal discipline that transcends simple prompt design to encompass the systematic optimization of information payloads for LLMs. We present a comprehensive taxonomy decomposing Context Engineering into its foundational components and the sophisticated implementations that integrate them into intelligent systems. We first examine the foundational components: context retrieval and generation, context processing and context management. We then explore how these components are architecturally integrated to create sophisticated system implementations: retrieval-augmented generation (RAG), memory systems and tool-integrated reasoning, and multi-agent systems. Through this systematic analysis of over 1400 research papers, our survey not only establishes a technical roadmap for the field but also reveals a critical research gap: a fundamental asymmetry exists between model capabilities. While current models, augmented by advanced context engineering, demonstrate remarkable proficiency in understanding complex contexts, they exhibit pronounced limitations in generating equally sophisticated, long-form outputs. Addressing this gap is a defining priority for future research. Ultimately, this survey provides a unified framework for both researchers and engineers advancing context-aware AI.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (8 more...)
- Health & Medicine (1.00)
- Education (0.92)
- Information Technology > Security & Privacy (0.45)
Continuous Multi-Task Pre-training for Malicious URL Detection and Webpage Classification
Li, Yujie, Liu, Yiwei, Li, Peiyue, Jia, Yifan, Wang, Yanbin
Malicious URL detection and webpage classification are critical tasks in cybersecurity and information management. In recent years, extensive research has explored using BERT or similar language models to replace traditional machine learning methods for detecting malicious URLs and classifying webpages. While previous studies show promising results, they often apply existing language models to these tasks without accounting for the inherent differences in domain data (e.g., URLs being loosely structured and semantically sparse compared to text), leaving room for performance improvement. Furthermore, current approaches focus on single tasks and have not been tested in multi-task scenarios. To address these challenges, we propose urlBERT, a pre-trained URL encoder leveraging Transformer to encode foundational knowledge from billions of unlabeled URLs. To achieve it, we propose to use 5 unsupervised pretraining tasks to capture multi-level information of URL lexical, syntax, and semantics, and generate contrastive and adversarial representations. Furthermore, to avoid inter-pre-training competition and interference, we proposed a grouped sequential learning method to ensure effective training across multi-tasks. Finally, we leverage a two-stage fine-tuning approach to improve the training stability and efficiency of the task model. To assess the multitasking potential of urlBERT, we fine-tune the task model in both single-task and multi-task modes. The former creates a classification model for a single task, while the latter builds a classification model capable of handling multiple tasks. We evaluate urlBERT on three downstream tasks: phishing URL detection, advertising URL detection, and webpage classification. The results demonstrate that urlBERT outperforms standard pre-trained models, and its multi-task mode is capable of addressing the real-world demands of multitasking.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
PAPN: Proximity Attention Encoder and Pointer Network Decoder for Parcel Pickup Route Prediction
Denis, Hansi, Mercelis, Siegfried, Luong, Ngoc-Quang
Optimization of the last-mile delivery and first-mile pickup of parcels is an integral part of the broader logistics optimization pipeline as it entails both cost and resource efficiency as well as a heightened service quality. Such optimization requires accurate route and time prediction systems to adapt to different scenarios in advance. This work tackles the first building block, namely route prediction. This is done by introducing a novel Proximity Attention mechanism in an encoder-decoder architecture utilizing a Pointer Network in the decoding process (Proximity Attention Encoder and Pointer Network decoder: PAPN) to leverage the underlying connections between the different visitable pickup positions at each timestep. To this local attention process is coupled global context computing via a multi-head attention transformer encoder. The obtained global context is then mixed to an aggregated version of the local embedding thus achieving a mix of global and local attention for complete modeling of the problems. Proximity attention is also used in the decoding process to skew predictions towards the locations with the highest attention scores and thus using inter-connectivity of locations as a base for next-location prediction. This method is trained, validated and tested on a large industry-level dataset of real-world, large-scale last-mile delivery and first-mile pickup named LaDE[1]. This approach shows noticeable promise, outperforming all state-of-the-art supervised systems in terms of most metrics used for benchmarking methods on this dataset while still being competitive with the best-performing reinforcement learning method named DRL4Route[2].
- North America > United States > Texas > El Paso County > El Paso (0.05)
- Asia > China > Shandong Province > Yantai (0.05)
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
- (5 more...)
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
Cheng, Xianfu, Zhang, Wei, Zhang, Shiwei, Yang, Jian, Guan, Xiangyuan, Wu, Xianjie, Li, Xiang, Zhang, Ge, Liu, Jiaheng, Mai, Yuying, Zeng, Yutao, Wen, Zhoufutu, Jin, Ke, Wang, Baorui, Zhou, Weixiao, Lu, Yunhong, Li, Tongliang, Huang, Wenhao, Li, Zhoujun
The increasing application of multi-modal large language models (MLLMs) across various sectors have spotlighted the essence of their output reliability and accuracy, particularly their ability to produce content grounded in factual information (e.g. common and domain-specific knowledge). In this work, we introduce SimpleVQA, the first comprehensive multi-modal benchmark to evaluate the factuality ability of MLLMs to answer natural language short questions. SimpleVQA is characterized by six key features: it covers multiple tasks and multiple scenarios, ensures high quality and challenging queries, maintains static and timeless reference answers, and is straightforward to evaluate. Our approach involves categorizing visual question-answering items into 9 different tasks around objective events or common knowledge and situating these within 9 topics. Rigorous quality control processes are implemented to guarantee high-quality, concise, and clear answers, facilitating evaluation with minimal variance via an LLM-as-a-judge scoring system. Using SimpleVQA, we perform a comprehensive assessment of leading 18 MLLMs and 8 text-only LLMs, delving into their image comprehension and text generation abilities by identifying and analyzing error cases.
- North America > United States (0.28)
- Europe > Italy > Sardinia (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (4 more...)
- Research Report (0.64)
- Workflow (0.46)
Multi-Order Hyperbolic Graph Convolution and Aggregated Attention for Social Event Detection
Liu, Yao, Liu, Zhilan, Tan, Tien Ping, Li, Yuxin
Social event detection (SED) is a task focused on identifying specific real-world events and has broad applications across various domains. It is integral to many mobile applications with social features, including major platforms like Twitter, Weibo, and Facebook. By enabling the analysis of social events, SED provides valuable insights for businesses to understand consumer preferences and supports public services in handling emergencies and disaster management. Due to the hierarchical structure of event detection data, traditional approaches in Euclidean space often fall short in capturing the complexity of such relationships. While existing methods in both Euclidean and hyperbolic spaces have shown promising results, they tend to overlook multi-order relationships between events. To address these limitations, this paper introduces a novel framework, Multi-Order Hyperbolic Graph Convolution with Aggregated Attention (MOHGCAA), designed to enhance the performance of SED. Experimental results demonstrate significant improvements under both supervised and unsupervised settings. To further validate the effectiveness and robustness of the proposed framework, we conducted extensive evaluations across multiple datasets, confirming its superiority in tackling common challenges in social event detection.
- Asia > China > Sichuan Province > Chengdu (0.05)
- Asia > Malaysia > Penang (0.04)
- Europe > United Kingdom > England > Surrey (0.04)
- (4 more...)